3 research outputs found
Multi-Fidelity Multi-Armed Bandits Revisited
We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of the
canonical multi-armed bandit (MAB) problem. MF-MAB allows each arm to be pulled
with different costs (fidelities) and observation accuracy. We study both the
best arm identification with fixed confidence (BAI) and the regret minimization
objectives. For BAI, we present (a) a cost complexity lower bound, (b) an
algorithmic framework with two alternative fidelity selection procedures, and
(c) both procedures' cost complexity upper bounds. From both cost complexity
bounds of MF-MAB, one can recover the standard sample complexity bounds of the
classic (single-fidelity) MAB. For regret minimization of MF-MAB, we propose a
new regret definition, prove its problem-independent regret lower bound
and problem-dependent lower bound , where is the number of arms and is the decision budget
in terms of cost, and devise an elimination-based algorithm whose worst-cost
regret upper bound matches its corresponding lower bound up to some logarithmic
terms and, whose problem-dependent bound matches its corresponding lower bound
in terms of
Cooperative Multi-agent Bandits: Distributed Algorithms with Optimal Individual Regret and Constant Communication Costs
Recently, there has been extensive study of cooperative multi-agent
multi-armed bandits where a set of distributed agents cooperatively play the
same multi-armed bandit game. The goal is to develop bandit algorithms with the
optimal group and individual regrets and low communication between agents. The
prior work tackled this problem using two paradigms: leader-follower and fully
distributed algorithms. Prior algorithms in both paradigms achieve the optimal
group regret. The leader-follower algorithms achieve constant communication
costs but fail to achieve optimal individual regrets. The state-of-the-art
fully distributed algorithms achieve optimal individual regrets but fail to
achieve constant communication costs. This paper presents a simple yet
effective communication policy and integrates it into a learning algorithm for
cooperative bandits. Our algorithm achieves the best of both paradigms: optimal
individual regret and constant communication costs
On-Demand Communication for Asynchronous Multi-Agent Bandits
This paper studies a cooperative multi-agent multi-armed stochastic bandit
problem where agents operate asynchronously -- agent pull times and rates are
unknown, irregular, and heterogeneous -- and face the same instance of a
K-armed bandit problem. Agents can share reward information to speed up the
learning process at additional communication costs. We propose ODC, an
on-demand communication protocol that tailors the communication of each pair of
agents based on their empirical pull times. ODC is efficient when the pull
times of agents are highly heterogeneous, and its communication complexity
depends on the empirical pull times of agents. ODC is a generic protocol that
can be integrated into most cooperative bandit algorithms without degrading
their performance. We then incorporate ODC into the natural extensions of UCB
and AAE algorithms and propose two communication-efficient cooperative
algorithms. Our analysis shows that both algorithms are near-optimal in regret.Comment: Accepted by AISTATS 202